-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Withdraw site resource when vm creation is failed #300
base: master
Are you sure you want to change the base?
Conversation
…ource-scheduler into feature/resource
globalscheduler/pkg/scheduler/framework/interfaces/framework.go
Outdated
Show resolved
Hide resolved
globalscheduler/pkg/scheduler/framework/plugins/defaultbinder/default_binder.go
Outdated
Show resolved
Hide resolved
globalscheduler/pkg/scheduler/framework/plugins/defaultbinder/default_binder.go
Show resolved
Hide resolved
@@ -643,20 +653,28 @@ func (n *SiteCacheInfo) updateSiteFlavor(resourceTypes []string, regionFlavors m | |||
} | |||
} | |||
|
|||
func (n *SiteCacheInfo) deductFlavor() { | |||
func (n *SiteCacheInfo) updateFlavorCount(deduct bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newly added function like this needs a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. I will do it.
Update: I am working on unit testcase. It takes some time to finish so I will make a seperate PR.
@@ -135,3 +136,15 @@ func (s *Snapshot) Get(siteID string) (*schedulersitecacheinfo.SiteCacheInfo, er | |||
func (s *Snapshot) GetFlavors() (map[string]*typed.RegionFlavor, error) { | |||
return s.RegionFlavorMap, nil | |||
} | |||
|
|||
func (s *Snapshot) GetRegionFlavors(region string) (map[string]*typed.RegionFlavor, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newly added function like this needs a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. I will do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was test added?
|
||
// Bind binds pods to site using the k8s client. | ||
// Same function with Bind except return bound resource info | ||
func (b DefaultBinder) BindResource(ctx context.Context, state *interfaces.CycleState, stack *types.Stack, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newly added function like this needs a unit test
consider refactor some segments out into functions for unit testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. I will do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need unit tests for some of the complex functions
I have some superficial comments. please get another approval from @jshaofuturewei and/or @CoderKevinZhang before merging |
|
||
//withdraw reserved resources to a pod & add it to cash to other pods | ||
func (sched *Scheduler) withdrawResource(podName string) error { | ||
resource := sched.PodSiteResourceMap[podName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map is not thread safe. When multiple events of one pod are triggered, how can we prevent synchronization issues here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map is not thread safe. When multiple events of one pod are triggered, how can we prevent synchronization issues here?
This function is only related to pod creation failure. And pod creation is scheduled by schduling Q which is synchonized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that a scheduled job will sync resources(cpu & mem) from openstack. Before informers get failed pods from apiserver, the scheduled job has already synchronized from openstack, which means the resource has already claimed back. In that case, is it possible that the failed pod adds resources back twice?
=> No it won't happen because PodSiteResourceMap is only for 60 seconds time gap. So whenever sync resources(cpu & mem) from openstack, this PodSiteResourceMap will be empty at the moment already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them directly.
The assumption is based on informers take fewer than 60 seconds to trigger the events. However, if at that time the resource map is not empty but the scheduled resource synchronization job happens to fetch resource information from openstack, the resources might be wrong.
=> resource map is empty, so the conflict won't happen. If this function creates too much issue, we should think again if we remove this function. It will be better to remove this 60 seconds gap. or resource collector should collect information in advance before waiting 60 seconds. 60 secnds is resource collector's issue mainly. Resource collection has to keep collecting information to remove this gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them directly.
For example, when you get failed pod event and want to withdraw resources, we assume to update resource based on resource collector does not update openstack resource information at that time. But if it does, we withdraw resource information twice.
We can happen to get a failed pod event just after resource collector updates the most recent openstack resource information, right?
Please correct me if I am wrong.
=> Regardless right or not, collecting resource information is resource collector's work. I tried to cooperate 60 seconds issue of resource collctor. But if there are so many issues, it will be better resource collector take it and solve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them directly.
Resource collector used to be the only source we know the resource information (mem & cpu). If you introduce the current codes to update resources, please make them work with each other. Thanks
=> It will be better for you to check resource collector's requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them directly.
Can you tell us besides resource collector, are the pull request to reduce/withdraw resources to update the openstack resources? Thanks
=> No, this PR for "withdraw", which resolves the issue caused by resource collector issue. It will be better to reconsider if this function should be implemented by scheduler or not.
selector := fields.ParseSelectorOrDie( | ||
"status.phase=" + string(v1.PodAssigned) + | ||
",status.assignedScheduler.name=" + schedulerName) | ||
"status.assignedScheduler.name=" + schedulerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pod which is scheduled can be watched by informer. In case there are millions of pods in the system, the informer here has to watch all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to design change, scheduler should watch pods which are : assigned, failed, bound, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to watch bound pods as well? I suppose that bound pods will be passed to dispatchers. They are no longer needed to process in scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"status.phase!=" + string(v1.PodAssigned)
also add a comment here explaining the reason why these pods are being watched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to watch the bound pods?
=> I think so. There is a function in scheduler.go.
func assignedPod(pod *v1.Pod) bool {
return pod.Spec.VirtualMachine != nil && pod.Status.Phase == v1.PodBound
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them. Otherwise I did not get any notifications.
According to the code history, the HQ team added the condition to watch assigned pods at 1812947#diff-ffa1d9598838131f8ef28441851152f1434f44e5c396195b12ec36c8528162a0R48
You removed "status.phase=" + string(v1.PodAssigned)". Did you ask the HQ team?
=> This part is not done by me. I remember you changed this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them. Otherwise I did not get any notifications.
Can you just click the link? The hq team member Jun Wang made the change.
=> That is why you have to talk to him if he did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them. Otherwise I did not get any notifications.
Jun Wang made this change by adding "status.phase=" + string(v1.PodAssigned)" and we think it is fine. You removed it and we are asking you why to make the change of removing the condition that Jun Wang made. Maybe you can ask Jun Wang to get his permission.
=> Because you want to change this part, it will be better for you to discuss with him
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them. Otherwise I did not get any notifications.
I did not want to change it. I just add comments based on your changes to remove the condition.
=> According to your comment, you need to talk to him.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them. Otherwise I did not get any notifications.
I think you are the pull request owner and made the change. Can you talk to the hq team since your changes are based on the Jun Wang's change? Thanks
=> Because this change is your idea, so, it will be better for you to talk to HQ. I don't want to change it. Any way, the selector is updated to:
selector := fields.ParseSelectorOrDie(
"status.phase = " + string(v1.PodAssigned) +
",status.phase = " + string(v1.PodFailed) +
",status.assignedScheduler.name = " + schedulerName)
globalscheduler/pkg/scheduler/framework/interfaces/framework.go
Outdated
Show resolved
Hide resolved
@@ -44,8 +44,9 @@ func (i *podInformer) Lister() corelisters.PodLister { | |||
// NewPodInformer creates a shared index informer that returns only non-terminal pods. | |||
func NewPodInformer(schedulerName string, client clientset.Interface, | |||
resyncPeriod time.Duration) coreinformers.PodInformer { | |||
//This selector is to avoid to receive all pods event so that it improves scheduling performance. // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add the steps to process different pod status phase here? Like failed, assigned, and bound?
=> status are added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please not just add status. Please add the STEPS to process different pod status phase
=> If you have specific preference on comments, it will be better you do it as you wish. Or please send me paragraph, then I will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them.
I have already given you examples below
Failed. I assume their resources are withdrawn
Assigned. I assume they are waiting for clusters to bind
Bound. As Peng Du and I mentioned, we don't know how schedulers want to process bound pods.
=> if you want, why don't you change it by yourself. I don't think this is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We asked this question since the bound pods are not needed to be watched. We want to confirm it with you. You are the pull request owner. Can you please answer the question so that we know if we are right?
=> This PR is for revoke resource not "bound". "Bound" pod check was already there. There should be clear reason why we remove it if we have to do it. o you have to talk with HQ who implemented it. And if it is ok to remove, we should redesign this part and do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them.
Can we say the informer does not need to watch bound pods here? The reason that you did not want to add condition "status.phase != Bound" is that there is existing bound pod check?
=> I will check if we can do it again. But if we can not do it, we need to redesign and it takes qute time to implement. I am not sure this is correct at this moment.
selector is updated:
selector := fields.ParseSelectorOrDie(
"status.phase = " + string(v1.PodAssigned) +
",status.phase = " + string(v1.PodFailed) +
",status.assignedScheduler.name = " + schedulerName)
@@ -44,8 +44,10 @@ func (i *podInformer) Lister() corelisters.PodLister { | |||
// NewPodInformer creates a shared index informer that returns only non-terminal pods. | |||
func NewPodInformer(schedulerName string, client clientset.Interface, | |||
resyncPeriod time.Duration) coreinformers.PodInformer { | |||
//This selector is to avoid to receive unneccesary pods event (e.g. scheduled) so that it improves scheduling performance. | |||
//This receives pod events only their status is one of failed, assigned, and bound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean how those pods with different status are processed.
- Failed. I assume their resources are withdrawn
- Assigned. I assume they are waiting for clusters to bind
- Bound. As Peng Du and I mentioned, we don't know how schedulers want to process bound pods.
=> Can you please change this part as you want if you have specific opinion. Please change this comment as you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them.
As I mentioned before, we only watch pods we want to process. For failed pods and assigned ones, we know why informers want to watch them as I updated the comments above. But you updated the information watch conditions to watch bound pods. But we don't know why we need to watch bound pods. Maybe you can tell us?
=> last meeting we agreed that selector will be "pod status != scheduled". And it is updated accordingto the discussion. "Bound" one was checked by HQ source code from very beginning. That is out of our scope currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them.
Since you don't know how to add multiple conditions here, we suggest using "pod status != scheduled" as an alternative "pod status == assigned AND pod status == bound And pod status == failed". However, as Pend and I found out, the bound pods are not needed, we want to confirm it with you and make the changes.
=> You have to check if it is OK to remove HQ code on Monday and check if there is any issue. Then we can remove it. func assignedPod(pod *v1.Pod) bool {
return pod.Spec.VirtualMachine != nil && pod.Status.Phase == v1.PodBound
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to my comments instead of editing them.
I did not want to remove any codes. I just want to add clear comments for your code changes. Jun Wang added a simple condition to search pods whose condition phase is Assigned that we understand. If you want to include more status phases, like Failed, Bound, and Assigned, please let us know why to do it.
=> You requested to see "Failed" pod to resolve resource collector issue.
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR proposes to withdraw site (cluster) resource allocated to a pod when the pod's vm creation is failed.
Which issue(s) this PR fixes:
Fixes #286
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
NONE
How to test
~/go/src/k8s.io/arktos$./hack/globalscheduler/globalscheduler-up.sh
open a new terminal
$cd ~/go/src/k8s.io/arktos/globalscheduler/test/yaml
$kubectl apply -f sample_6_clusters.yaml
$kubectl apply -f sample_2_schedulers.yaml
$kubectl apply -f sample_2_distributors.yaml
$kubectl apply -f sample_2_dispatchers.yaml
$kubectl apply -f sample_6_pods.yaml
$kubectl get pods